Lexical and orthographic distances between Germanic, Romance and Slavic languages and their relationship to geographic distance

نویسندگان

  • Wilbert Heeringa
  • Jelena Golubovic
  • Charlotte Gooskens
  • Anja Schüppert
  • Femke Swarte
  • Stefanie Voigt
چکیده

When reading texts of different but closely related languages, intelligibility is determined among others by the number of words which are cognates of words in the reader’s language, and orthographic differences. Orthographic differences partly reflect pronunciation differences and therefore are partly a linguistic level. Dialectometric studies in particular showed that different linguistic levels may correlate with each other and with geography. This may raise the question of whether both lexical distance and orthographic distance need to be included in a model which explains written intelligibility, or whether both factors can even be replaced by geographic distance. We study the relationship between lexical and orthographic variation among Germanic, Romance and Slavic languages to each other and to geography. The lexical distance is the percentage of non-cognate pairs, and the orthographic distance is the average of the Levenshtein distances of the cognate pairs. For each language group we found a significant correlation between lexical and orthographic distances with a medium effect size. Therefore, when modelling written intelligibility preferably both factors are included in the model. We considered several measures of geographic distance where languages are located at the center or capital of the countries where they are spoken. Both as-the-crowflies distances and travel distances were considered. Largest effect sizes are obtained when correlating lexical distances with travel distances between capitals and when correlating orthographic distances with as-the-crow-flies distances between capitals. The results show that geographic distance may represent lexical and orthographic distance to some extent in a model of written intelligibility.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Intelligibility of Written Germanic Languages: Do We Need to Distinguish between Orthographic Stem and Affix Variation?

We measured orthographic differences between five Germanic languages. First, we tested the hypothesis that orthographic stem variation among languages does not correlate with orthographic variation in inflectional affixes. We found this hypothesis true when considering the aggregated stem and affix distances between the languages. We also correlated the stem and affix distances of the cognate p...

متن کامل

Pasture Names with Romance and Slavic Roots Facilitate Dissection of Y Chromosome Variation in an Exclusively German-Speaking Alpine Region

The small alpine district of East Tyrol (Austria) has an exceptional demographic history. It was contemporaneously inhabited by members of the Romance, the Slavic and the Germanic language groups for centuries. Since the Late Middle Ages, however, the population of the principally agrarian-oriented area is solely Germanic speaking. Historic facts about East Tyrol's colonization are rare, but sp...

متن کامل

Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica

In this paper we present preliminary work conducted on semi-automatic induction of inflectional paradigms from non annotated corpora using the open-source tool Linguistica (Goldsmith 2001) that can be utilized without any prior knowledge of the language. The aim is to induce morphology information from corpora such as to compare languages and foresee the difficulty to develop morphosyntactic le...

متن کامل

Reference to Kinds across Languages

NPs occurring in canonical argumental positions) from a crosslinguistic point of view. It is proposed that languages may vary in what they let their NPs denote. In some languages (like Chinese), NPs are argumental (names of kinds) and can thus occur freely without determiner in argument position; in others they are predicates (Romance), and this prevents NPs from occurring as arguments, unless ...

متن کامل

comprehension : linguistic and extralinguistic determinants

The three West-Germanic languages Dutch, Frisian and Afrikaans are so closely related that they can be expected to be mutually intelligible to a large extent. In the present investigation, we established the intelligibility of written Afrikaans and Frisian by Dutch-speaking subjects. It appeared that it is easier for speakers of Dutch to understand Afrikaans than Frisian. In order to explain th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013